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Abstract 

^Wc  investigate  absolute  bounds  (or  inequalities)  on  the  mean  and  standard  deviation  of  transformed  data 
values,  given  only  a  few  statistics  on  the  original  set  of  data  values.  Our  work  applies  primarily  to 
transformation  functions  whose  derivatives  arc  constant-sign  for  a  positive  range  (c.g.  logarithm,  antilog, 
square  root,  and  reciprocal).  With  such  functions  we  can  often  get  reasonably  tight  absolute  bounds,  so  that 
distributional  assumptions  about  the  data  needed  for  confidence  intervals  can  be  eliminated.  We  investigate  a 
variety  of  methods  for  obtaining  such  bounds,  first  examining  bounding  curves  which  are  straight  lines,  then 
those  that  are  quadratic  polynomials.  While  the  problem  of  finding  the  best  quadratic  bound  is  an 
optimization  problem  with  no  closed-form  solution,  we  display  a  variety  of  closed-form  quadratic  bounds 
which  can  come  close  to  the  optimal  solution.  We  emphasize  what  can  be  done  with  prior  knowledge  of  the 
mean  and  standard  deviation  of  the  untransfonned  data  values,  but  do  address  some  other  statistics  too. 
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abbreviated  title:  Absolute  Bounds  on  Statistics  of  Transformed  Values 


1 .  Introduction 

Standard  transformations  of  numeric  data  values  such  as  logarithm,  antilog,  square  root,  square,  cube,  and 
reciprocal  arc  frequently  appropriate  as  a  prelude  to  statistical  analysis  of  finite  data  sets  [7],  Sometimes, 
however,  the  data  are  already  aggregated  into  counts  and  means,  and  the  original  data  values  lost.  This 
happens  when  the  original  data  is  too  large  to  handle  and/or  contains  sensitive  information,  as  the 
U.  S.,  Census,  which  publishes  much  of  its  data  as  aggregates.  We  may  also  deliberately  create  "database 
abstracts"  of  aggregate  statistics  to  facilitate  quick  statistical  estimates  by  "antisampling"  methods  [10]. 
Statistics  on  the  transformed  values  cannot  be  calculated  uniquely  when  the  original  data  is  so  preaggregated1. 
But  if  we  arc  doing  exploratory  data  analysis  [13, 6],  an  estimate  of  a  statistic  on  the  transformed  data  may  be 
all  that  we  need.  We  address  one  set  of  methods  for  obtaining  such  estimates,  by  finding  absolute 
(unconditionally  guaranteed)  bounds  on  the  mean  and  standard  deviation  for  data  under  some  common 
transformations. 

Absolute  bounds  arc  the  only  true  "nonparnmctric"  form  of  estimate,  and  as  such  have  advantages. 
Compared  to  "reasonable-guess”  estimates  [9],  biasedness  of  the  estimator  need  not  be  dealt  with,  while  at  the 
same  time  providing  numbers  close  to  the  true  answer  for  this  category  of  problems.  As  [7]  discusses, 
confidence  intervals  for  the  mean  and  standard  deviation  of  transformed  data  are  difficult  to  obtain  and 
methods  arc  subject  to  exceptions,  and  thus  absolute  bounds  easily  obtained  are  appealing.  Tight  enough 
absolute  bounds  can  be  equivalent  to  a  good  estimate.  An  estimate  of  a  statistic  can  also  be  logically  incorrect 
when  bounds  arc  tight,  i.e.  it  may  not  be  a  statistic  of  any  possible  distribution  consistent  with  the  constraints. 
Bounds  arc  useful  for  other  reasons  as  well.  Some  algorithms  exploit  only  bounds,  as  tire  "branch  and 
bound"  methods  of  [4]  for  retrieval  of  information  from  a  database.  Other  advantages  we  have  investigated  in 
previous  work  [10,  11,  12].  In  addition,  the  mathematics  of  absolute  bounds  is  straightforward  and  requires 
only  elementary  calculus. 

Our  approach  is  to  give  a  variety  of  bounds  formulae  for  the  same  estimation  situation.  In  general,  we  do 
not  know  which  of  several  bounding  methods  will  be  the  best  for  a  problem,  and  this  suggests  die  program 
architecture  of  an  artificial-intelligence  "production  system"  [1],  We  can  combine  results  by  taking  the 
minimum  of  all  die  upper  bounds,  and  the  maximum  of  all  die  lower  bounds. 


I'vcn  if  the  data  is  transformed  hcfoic  betnp  .lytMcpatcd.  there  arc  still  many  reasons  to  want  statistics  on  the  tinlransl'ormcd  data.  To 
use  the  example  of  [7],  it  is  useful  to  study  i.nnt.ill  m  the  cube  on  it  of  inches.  'out  one  may  then  he  interested  in  MMisiics  on  the  cube  of 
th.it,  the  meaningful  quantity  of  total  volume 
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2.  Our  approach 

In  this  work  wc  examine  transformation  functions  whose  derivatives  have  a  constant  sign  in  the  interval  of 
study.  (We  may  be  able  to  relax  this  restriction  in  particular  cases,  however;  usually  only  a  constant-sign 
second  derivative  is  necessary.  Chapter  3  of(5]  discusses  detailed  restrictions,  in  particular  die  notion  of 
function  convexity,  for  the  material  wc  cover  in  section  3  below.)  The  so-called  "power  transformations"  and 
their  inverses  [2]  satisfy  this  constant-sign  restriction  for  positive  data  values.  Six  common  power 
transformations  are  log,  antilog,  square  root,  square,  cube,  and  reciprocal,  and  these  will  be  our  primary 
examples.  Logarithm  is  particularly  important  because  the  mean  of  the  logs  is  the  log  of  the  geometric  mean 
of  a  set  of  data  values;  reciprocal  is  also  important  because  it  provides  the  key  to  handling  quotients  of 
random  variables.  To  summarize  the  six  example  transformations: 


Function 

first  dcriv. 

second  deriv. 

steepest  point 

ln(x) 

+ 

- 

leftside 

ex 

+ 

+ 

right  side 

7x 

+ 

- 

leftside 

x2 

+ 

T 

right  side 

X3 

+ 

+ 

right  side 

1/x 

- 

+ 

left  side 

Wc  shall  assume  the  following  statistics  on  the  original  (untransfonned)  data  values  arc  known: 

•  /x,  tiie  mean  of  die  values  (or  equivalently,  the  sum  of  the  values  and  the  number  of  values) 

•  m,  the  minimum  of  die  values 

•  M,  die  maximum  of  the  values 

Kven  when  wc  do  not  know  the  minimum  and  maximum  exactly,  wc  can  often  assume  extreme  "safe"  values 
which  the  minimum  cannot  be  less  than  and  the  maximum  cannot  be  greater  than,  and  which  wc  can  use  in 
our  rmulae.  So  it  is  reasonable  to  believe  we  can  always  come  up  with  a  minimum  and  maximum  for  a  set 
of  values. 

In  much  ofwh.it  follows  we  also  assume  the  following  is  known: 

•  a,  the  standard  deviation  of  the  values  -  defined  as  21<j<n(xj-/i)‘>Vn.  instead  of  the  more 
conventional  formula  with  a  denominator  of  n-1 

Note  wc  use  the  symbols  /i  and  a  to  emphasize  that  we  arc  consider  finite  data  populations,  which  are  not 
necessarily  samples  of  anything. 


Wc  shall  ignore  linear  transformations  of  variables  as  a  preliminary  to  applying  power  functions,  since 
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these  can  be  handled  trivially.  For  instance,  fix)  =  ln(ax  +  b)  can  be  analyzed  by  defining  y  =  ax-)-b  and 

analyzing g(y)  =  ln(y),  where  a  =  a/i  4-banda  =aa  . 

y  X  y  x 

Our  basic  idea  is  to  find  functions  that  arc  (a)  entirely  above,  and  (b)  entirely  below  the  curve  of  the 
function  on  the  data-value  interval.  We  shall  consider  two  important  eases:  bounding  curves  that  are  straight 
lines  (sections  3  and  4)  and  bounding  curves  that  arc  second-degree  polynomials  (quadratics)  (sections  5,  6,  7, 
and  8).  Subsequent  sections  consider  extensions  to  this  framework:  use  of  subset  means  and  standard 
deviations  in. section  9,  use  of  order  statistics  in  section  10,  use  of  distribution  fits  in  ' '  ’d  adjustments  for 
small  populations  in  12.  We  conclude  with  some  simple  test  experiments  in  section  13 

3.  Linear  bounds  on  the  mean 

3.1 .  Overview 

For  straight  lines,  one  curve  can  be  a  tangent  to  the  curve  at  some  point  (for  convenience,  the  mean);  the 
other  a  secant  of  the  curve  through  it  at  die  minimum  and  the  maximum.  For  curves  with  negative  second 
derivative  like  logarithm  and  square  root,  the  tangent  is  an  upper  bound,  the  secant  a  lower  bound;  for  curves 
with  positive  second  derivative  like  antilog  and  reciprocal,  the  tangent  is  the  lower  bound  and  the  secant  the 
upper.  These  bounding  lines  map  directly  into  bounds  on  the  mean  and  standard  deviation,  for  note  if  ax  4- b 
>  fix)  for  all  x  in  a  range,  f  some  transformation  functions  satisfying  our  restrictions,  and  F  denoting 
expected  value,  then 

F(ax  +  b)  >  I '.(fix)),  or 
aF(x)  4-  b  >  H(fix)),  or 
an  +  b>  F(fix)) 

F(fix))  being  the  quantity  we  arc  interested  in  bounding. 

3.2.  Linear  bounds  on  the  mean 

Let  us  apply  diese  ideas  to  the  mean  of  transformed  values  (see  figure  3-1).  The  tangent  to  fix)  at  /i  has 
equation 

y  =  x  *  t'(n)  +  [fin)  -  f  *  >») 

I  his  leads  to  a  well-known  bound  (generalized  in  [5],  p.  70): 

F*f '(F)  +  (fiF)-F  *f»l  -  M 

On  the  other  side  of  the  curve,  the  secant  through  the  maximum  and  minimum  forms  a  bound.  'Ihis  line  has 
equation 

y  =  x  *  [(fiM)-f(m))/(M-m)|  +-  [fim)  -  m  *  |(fiM)-fim))/(M-m)]| 
which  corresponds  to  the  bound 
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H  *  [(f(M)-fl[m))/(M-m)l  +  [f(m)  -  in  *  [( l'( M )- f( m ))/( M - m )]] 

=  (/i-m)  *  |(HM)-f|m))/(M-m)]  +  Rin) 

=  (L-a)t\m)  +  aRM)  —  f(m)  +  a(l(M)-Rm)) 
where  a  =  (ji- m)/(M-m) 

To  give  an  example,  if  a  set  of  data  values  ranges  from  10  to  100,  and  the  mean  is  23,  the  mean  of  the 

logarithms  of  the  data  values  has 

an  upper  hound  of  ln(23)  =  3.135 
a  lower  bound  of  77/90  ln(  10)  +  13/90  ln(  100)  =  2.635 

Hence  the  geometric  mean  of  the  original  data  values  is  between  e2635-  13.9  and  c3  135  --  23.  In  general 
from  these  formulae,  the  geometric  mean  is  between  fi  and  m(M/m)n;  and  the  harmonic  mean  is  between  /i 
and  l/[l/m  +  1/M  -  (i/mM|. 

3.3.  Proof  that  tangent  at  the  mean  is  optimal 

Note  that  the  bound  obtained  from  taking  the  tangent  at  ju.  is  optimal  for  the  conditions  we  arc  assuming  on 
f.  To  sec  this,  suppose  wo  use  the  tangent  at  some  other  point  t.  i.c.  the  line  y  =  fft)  +  (x-t)f(t).  Then  the 
mean  on  this  bound  line  is 

H[fU)  +  (x.-t)f'(OI  =  f(t)  4-  (/r-t)f(t) 

Now  we  want  to  find  the  maximum  of  this  as  t  varies,  so  we  take  the  derivative  with  respect  to  t  and  set  it 
equal  to  zero: 

f  (t)  -  f(t)  +  (/x-OHt)  =  o  =  (^t)fi'(t) 

Hut  since  we  assumed  that  f  had  a  constant-sign  second  derivative  in  the  interval  of  interest,  the  only  way  this 
can  be  zero  is  if  n  =  t.  Hence  the  only  extreme  value  for  the  bound  will  be  when  wc  take  a  tangent  at  p  --  a 
minimum  for  downwards-curving  functions,  and  a  maximum  for  upwards-curving. 

3.4.  Miscellaneous  comments 

In  the  case  of  a  negative  second  derivative,  the  tangent  bound  is  an  upper  bound,  and  the  secant  bound  a 
lower  bound:  otherwise,  the  reverse.  Note  the  two  bounds  arc  related,  because  they  can  be  rewritten  as 
fi[(  l-«)m  +  aM)  and 

(l-tt)Rm)  +  «RM),  where  a  -  (/i-m)/(M-m) 
so  they  represent  interchanging  of  a  weighting  and  functional  application. 

Here  is  a  tahlc  of  the  linear  bounds  for  our  six  common  transformations: 


•XR- 
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Function 

Upper  mean  i>  u.,d 

I.ower  mean  bound 

natural  log 

ln(/x) 

(l-a)ln(m)  +  aln(M) 

antilog 

(l-o)cm  +  acM 

c'1 

square  root 

Vn 

(l-a)Vm  +  a  VM 

square 

(l-a)m2  +  aM2 

M2 

cube 

(l-a)m3  +  «M3 

M3 

reciprocal 

a/m  +  (l-a)/M 

1/M 

where  a  =  [p-m]/[M-ml 

3.5.  Accuracy  of  linear  mean  bounds 

To  illustrate  effectiveness  of  the  bounds,  we 

tabulate  the  bounds  for  m=10,  M=100,  f=ln,  and  for 

/a  =  19,28,37,46,55,64,73,82,  and  91.  The  "bounds  range  fraction"  is  the  ratio  of  die  distance  between  the 

bounds  to  the  total  range  of  die  function  on  the  values,  die  difference  between  f(M)  and  (fill);  it  indicates  the 

quality  of  the  estimate. 

mean  (p) 

upper  hound  loner  bound 

bounds  range  fraction 

19 

2.944 

2.533 

.179 

28 

3.332 

2.763 

.247 

37 

3.611 

2.993 

.268 

46 

3.829 

3.224 

.263 

55 

4.007 

3.454 

.240 

64 

4.159 

3.684 

.206 

73 

4.290 

3.914 

.163 

82 

4.407 

4.145 

.114 

91 

4.511 

4.374 

.059 

It  is  typical 

that  the  estimates  arc  best  for  extreme  fi,  and  the  error  is  worst  for  a  particular  value  inside  the 

range.  We  can  calculate  this  value.  Assume  f  has  negative  second  derivative  (the  other  ease  is  analogous). 
Then  we  want  to  find  the  maximum  of  the  function  representing  the  difference  of  die  tangent  and.  secant 
bounds,  or 

g(x)  =  fffi)  -  (l-rt)(Tm)  -  «f(M),  where  «  =  (ji-m)/(M-m) 

We  find  diis  by  setting  to  zero  the  derivative  with  respect  to  fi,  in  other  words 

dg(/a)/dx  0  -  df)tx)/dx  +  fl[m)/(M-in)-  f)M)/(M-m) 

-  (If  M)-f(m))/(M-m) 

Or  in  oilier  words,  the  maximum  error  occurs  for  any  function  f  (that  satisfies  our  conditions)  for  a  mean  at 
die  point  where  die  tangent  to  f  i.>  parallel  to  the  secant  through  die  endpoints.  This  makes  sense  because  Ulis 


t 

t 


is  the  point  at  which  f(x)  stops  "turning  away"  from  the  secant  and  begins  turning  back  towards  it.  Note  by 
Rollc’s  Theorem  there  is  always  one  such  point  where  the  lines  arc  parallel,  and  the  constant  sign  of  tire 
second  derivative  ensures  drat  there  is  never  more  dian  one  such  point. 


For  specific  f  we  can  tabulate  the  point  of  maximum  error  from  this  formula,  as  a  function  of  m  and  M. 


Function 


Worst  p 


natural  log  (In  x) 
antilog  (e') 
square  root 
square 
cube 

reciprocal 


(M-m)/ln(M/m) 
ln[(cM-cm)/(M-m)| 
(M-m)?/4[i/M  -  / nip 
(M  +  in)/2 

v/[(nr+m.M  +  M?)/3| 

/(mM) 


The  maximum  error  may  then  be  obtained  as  |f(/x)  +  ff  M)  -  [(|Uwors('m)OlM)-ffm))/(M-m)]|. 


3.6.  Bounds  on  the  standard  deviation,  given  mean 

A  simple  application  of  the  linear  bounds  on  the  mean  of  transformed  values  is  to  bounding  the  standard 
deviation  of  a  set  of  values  given  only  die  maximum  (M),  minimum  (m),  and  mean  (p).  The  variance  is 
computed: 

2(x-ji)2/n  =  Xx2/n  -  p2 

Hut  since  square  is  a  continuous  function  with  a  constant-sign  second  derivative,  we  can  bound  the  second 
summation,  and  hence  the  bounds  on  the  variance  are: 
lower  bound:  p2  -  p2  =  0 

upper  bond:  m2  +  (/r-m)(M2-m',)/(M-m)  -  p2  -  /xM  +  (im  -  mM  -  p2  =  (/i-m)(M-/x) 

And  so  the  bounds  on  die  standard  deviation  arc: 

lower  bound:  0 

upper  bond:  v/[(jx-m)(M-p.)| 

We  will  use  diis  result  frequently. 


4.  Linear  bounds  on  the  standard  deviation 

There  arc  two  methods  we  can  use  to  bound  the  standard  deviation  of  a  set  of  transformed  values.  First,  we 
can  use  the  two  bounds  lines  used  previously,  bound  the  sum  of  Lhc  squares,  and  subtract  out  the  effect  of  the 
mean  (i.c.  use  the  formula  Xx2/'n  -  [£x/n|2).  Second,  we  can  construct  two  new  lines  passing  through  ffx)  at 
tire  mean  of  the  transformed  values. 
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4.1.  Sum-of-squares  bounds 

Bound  line  y  =  ax4b  has  second  moment  (sum  of  squares)  equal  to 

F[(ax4b)2]  =  H[a2x2  4  2abx  +  b2]  =  a  2(a2  +  /x2)  4  2ab/i  4  b2  =  (a/i  +  b)"  +  a2a2 
For  our  two  bounds  lines: 

tangent:  a  =  f'(ja).  b  =  [ff/i)  -  /i  *  f(/i)| 

secant:  a  =  (fiM)-Rm))/(M-m).  b  =  (flrn)  -  m  *  [(RM)-Rm))/(M-m)]] 
hence  the  tangent  bound  on  the  sum  of  the  squares  is 

(u2  V)[fV)2l  +  2|i(r(|1)|[fT|x)  -  n  *  f(|i)l  +  [IV)  -  *  fW 

=  a2[f(^)|2  4  [f(/x)I2 

and  the  secant  bound  is 

P2(a2  +  n2)  4  2/x/?[Rm)  -  m/J]  4-  [f(m)  -  mP]2 
where  p  -  [RM)-Rm)l/[M-mj 

To  find  bounds  on  the  variance,  then,  we  subtract  the  larger  of  these  two  bounds  from  the  square  of  the 
lower  bound  on  the  mean  to  get  the  upper  bound;  and  subtract  the  smaller  of  these  two  bounds  from  the 
square  of  the  upper  bound  on  the  mean  to  get  the  lower  bound.  The  standard  deviation  then  has  upper 
bound  the  square  root  of  the  variance  upper  bound,  and  lower  bound  the  square  root  of  the  variance  lower 
bound. 

To  return  to  our  previous  example,  suppose  f=  In,  m  =  10,  M  =  100,  /i  =  23.  and  also  suppose  a  =  10.  Then 

the  bounds  on  the  sum  of  squares  arc 

tangent:  629  *  (1/23)2  +  2  *  23  *  (1/23)  *  |ln(23)  -  23  *  (1/23)1 
4  [ln(23)  •  23  *  (1/23)]2  =  1.19  4  4.28  4  4.57  =  10.04 

secant  p  =  ln(  1 00/ 1 0)/<  100- 10)  =  .02558:  hence  bound  is 
(  02558)2  *  629  4  2  *  23  *  .02558  *  [ln(  10)  -  10  *  .02558]  4  [In(10)  -  10  *  .02558]2 
=  .412  4  2.409  4  4.189  =  7.010 

Now  since  the  bounds  on  the  mean  are  2.635  and  3.135  from  our  analysis  in  section  3,  the  bounds  on  the 
square  of  the  mean  arc  6.95  and  9.82.  Hence  bounds  on  the  variance  are  10.04-6.95  =  3.09  and 
7.01-9.82=  -2.81.  and  bounds  on  the  standard  deviation  are  thus  /3.09=  1.76  and  0. 

4.2.  Special  standard-deviation  bounds  lines 

To  bound  the  standard  deviation  of  the  transformed  values  we  can  use  different  bound  lines  than  for  the 
mean.  First,  let  us  assume  we  know  an  exact  value  for  the  mean  of  the  transformed  data  values  --  call  it  9. 
Distance  from  9  to  each  transformed  data  value  is  what  needs  to  be  linearly  bounded,  so  we  use  secants 
through  fix)  at  <p  (see  figure  4-1).  We  asuime  fix)  is  monotonic,  and  hence  t' 1  ( qp )  is  unique,  so  let  f'((p)  =  »' 
(i.c,,  9  =  Ri')).  So  to  get  an  upper  bound  on  the  standard  deviation  of  the  transformed  values,  we  use  a  line 
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below  f(x)  for  x<i>,  and  above  for  x>j>;  and  to  get  a  lower  bound,  a  line  above  l(x)  for  x<v,  and  below  for  x>i>. 

(Vice  versa  for  a  monotonically  decreasing  fl(x).)  Now  since  we  assume  II  xj  has  a  constant-sign  second 

derivative  in  the  interval,  the  line  segment  from  m  to  v  must  lie  constantly  to  one  side  of  f(x),  and  similarly  the  . 

line  segment  from  v  to  M.  Hence  choose  the  extensions  of  those  two  line  segments  into  lines  as  our  bounds 

lines.  These  lines  have  equations 

y  =  (x-«’)(f(»')-fTm»/(r-m)  -t-  fl*-) 

'  y  =  (x-j')(f(M)-f(/'))/(M-*')  +  (I*/) 

Now: 

o]  =  F.[(y-fl(,))2I 

And  if  y  =  m(x-*>)  +  fl>)  this  is: 

E([m(x-«/)  +  fl»-  H*')121  =  K[m2(x-i»)2l 
=  m2[o2 +  (*>-/!  )2] 

Hence  using  the  formula  for  the  variance,  the  second  moment  about  the  mean,  the  variance  of  the 

transformed  values  is  bounded  by 

[<x2  +  (v-/i)2]  [(»»-fIm))/(/i-m))2  and 
and 

l<r2  +  (/’-ti)2U(tlM)r)/(Mr)}2 

Hence  the  standard  deviation  is  bounded  by 

y,(a2  +  (i'-jji)3l  ((fIi>)-f(M))/(v-M)!  and 
V(a2  +  (v-}i)2]  HU* )-f(m))/(r-m)] 

They  arc  upper  and  lower  bounds  respectively  for  curves  with  positive  second  derivative,  and  vice  versa  for 
negative  second  derivative.  Hence  the  bounds  arc  just  an  "adjusted"  standard  deviation  of  the  original  values 
times  the  slopes  of  the  lines  from  the  mean  of  the  transformed  values  to  the  minimum  and  maximum  on  the 
interval. 

Note  since 

off*)  is  between  a[(tjp)-f(m))/(/i-m)] 

and  «r|(fl(M)-fI/i))/(M-/i)),  for  f'(x)  constant-sign 

a  rough  approximation  of  the  standard  deviation  of  the  transformed  values  (as  opposed  to  bound)  may  always 
be  obtained  from  af(r').  and  this  will  be  increasingly  good  an  approximation  as  a  gets  smaller.  Also  note  that 
for  a  narrow  range  of  mean  bounds,  the  difference  between  our  standard  deviation  bounds  is  a  rough 
approximation  of  the  second  derivative  of  fat  v: 

~  o[(fTM)-fl>))/(M-i')|  -  o[((ji')-f(in))/(»'-m)]  ~  ZofV) 

So  the  width  of  the  bounds  varies  proportionately  with  the  magnitude  of  the  second  derivative  at  the  mean  of 
the  transformed  values. 
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4.3,  Handling  inexact  transform  means 

But  tilts  assumes  we  know  r,  the  mean  of  the  transformed  values,  exactly  We  do  for  the  square  function, 
for  instance.  Otherwise  there  is  an  adjustment  we  can  make.  I  ct  the  upper  and  lower  bounds  on  the  value  v 
which  maps  to  the  transform  mean  he  z^  and  i>  Then  the  hounds  on  tire  variance  of  the  transformed  values 
are 

maxjmax^  [a?  +  (/a-t')2ll(lf/')-f(m))/(v-m)]2. 

'  L~  ”  U 

inaXj,  <p<¥  [<r2  +  (/t-^)2][(tT»')-f(M))/(t'-M>]2] 
and  minlmin^  <y<y  [<x2  +  (/i-z')‘l[(fl*’)‘f(m))/(z'-m)]2, 
min^  <„<„  ))/(^-M)}2) 

Since  niax(max(g(x)*s(x)),max(h(x)*s(x)))  -  max(max(g(x)*s(x).h(x)*s(x)))  =  max(max(g(x),h(x))*s(x)),  we 
can  simplify: 

max  ^  -  lmax((f(t')  f(M))/(v-M)|\(f(i')-fitn))/(t>-m)f|*[(T2-f(ft-*«)?]] 

"l  -  -  L 

andmin  |min[( lv*')  t\M ))/(«' -M )]?.( tX *')-f(m»/(*'-m)]?]*[cx:>' 4- (/ut-»»)2]] 

"l  -  -  U 

First,  suppose  f(x)  is  monotonically  increasing  (like  all  of  our  six  important  functions  except  1/x).  If  the 
second  derivative  is  positive,  then  the  inner  max  is  the  first  subexpression  in  the  first  bound  above,  and  the 
inner  min  is  the  second  subexpression  in  the  second  bound.  We  can  then  rewrite  the  formulae: 

max^  <r  |<fl,HM>)/<,A1>|2V  +  </i-^ 

andmin  .  .  l(lli»)-lilm))/(i>-m)|:>*|«T?  +  (jtt-i')2| 

"  I  —  "  —  L 

Note  that  these  represent  the  product  of  tw  o  functions  which  arc  both  monotonically  increasing  with  respect 
to  v.  For  a  monotonically  increasing  l(x ).  fi  is  a  lower  bound  on  v.  The  product  of  two  monotonically 
increasing  functions  is  a  monotonically  increasing  function.  The  max  of  a  monotonically  increasing  function 
is  the  value  at  the  rightmost  point,  and  the  min  is  at  the  leftmost  point.  So  the  revised  bounds  on  the  variance 
of  the  transformed  values,  given  Ux)  increasing  and  w  ith  positive  second  derivative,  are 
upper:  |fflzL.)-HM))/(fi;M)|?>'  +  (Fz[/| 

lower:  ((fl[z/(  )-f(m))/V(  -m)]2*(a2  )2] 

Similarly  if  l(x)  has  a  negative  second  derivative  (again,  assuming  the  first  derivative  is  positive),  we  can  show 
by  analogous  reasoning  that  the  bounds  arc: 

upper:  [(Rz^  )-flm))/(p[  -m)|2*[<x2  K/t-z-j)2) 

lower:  [( flz-  L  )-f( M ))/(z- ( M )|2*[a2 T (/i-p(J)2| 

Using  our  example  of  f  =  In,  m  =  10,  M  -- 100,  /j,  -23,  or  =  10,  wc  use  tiie  previously  found  linear  bounds  on 


the  mean  of  the  logarithms  of  *'u  =  c?  135  =  23  and  t-(  =  c2  63s  -  13.9.  Hence  bounds  on  the  standard  deviation 
of  the  logarithms  are: 

V[1024-9.12][(2.635-In(10))/(13.9-I0)|  =  !•  '6 
V(102 4- 02][(ln(  100)-3. 1 35)/(  100-23)]  =  .  , 

both  being  better  than  the  sum-of-squares  bounds  in  section  4.1. 

Unfortunately,  revised  formulae  for  monotonically  decreasing  functions  arc  not  as  easy.  Ihc  partial 
derivative  of  .the  bounds  expressions  must  be  set  to  zero  and  inverted.  Consider  die  case  for  the  upper  bound 
for  a  curve  with  a  negative  second  derivative  (like  i/x): 

0  =  9/3»'[((aJ')-f(M))/(v-M)|2*[a2f  (fi-v  )?|| 

0  =  2[(fl[*')-f( M))/( *z-M )]  *  |(I'(i')(j<-!VI)  •  (!(»')■  f(M))  /  (tz-M)2*!  *  [a24-(/i-i')2|] 

+  [(RrHM))/(i»-M)r "  *  -2(|x-i») 

[(f(i'X»,'M)  -  (ff*')-f(M))  /  (v-MY\ *  [a24-(M-»')2|]  -  |(lU)  f(M))/(»'-M)!  *  (/x  t-) 
which  is  then  solved  for  v,  and  the  value  substituted  in  the  function  differentiated  above  to  obtain  the  bound. 
Analogously,  the  other  bound  is  found  by  solving 

[(f(j'Xt'-m)  -  (fl>H(m))  /  (»>-m)2|  *  [<r2  4-(pi')2|] 

=  [( fix' )-f(m))/( -m )]  * 

4.4.  Evaluating  standard-deviation  bounds 

The  sum-of-squares  bounds  of  section  4.1  arc  hard  to  evaluate,  but  we  can  examine  die  slope-based  bounds 
of  the  last  section,  provided  we  assume  v  is  known  exactly.  We  arc  interested  in  knowing  the  largest  possible 
difference  between  die  upper  and  lower  bounds  for  an  exact  v,  or  the  maximum  of 
D00  =  <72[[(ftM)-fti>))/(M-«')|  -  |(U»')  f(m))/(»'-ni)|| 

whereof  =  +  (/i-i')2 

For  four  of  our  functions  --  x2.  x\  1/x,  and  /x  --  this  is  straightforward  to  find: 

•  x2:  IX*)  =  OjXi'  f  M )-(*'  4-  m)|  -  a,(M  ni),  so  1 )  is  constant. 

«  x3:  IX*)  =  cr2{(x’‘>  +  I'M  4-  M2)  -  (i>2  4  i»m  +  m2)]  =  a2[i/(M-m)  +  (M2-m2)].  This  has 
maximum  at  *  -  M  of  a2(M-m)(2M-m). 

«•»  1/x:  IX*»)  =  tr2[l/*m  -  1/^M]  =  o2(l/m  -  l/M)/i>.  This  has  a  maximum  at  *  =  m  of 
o2(M-m)/m2M. 

•  Vx:  D(*)  =  a2[(l/(  /*  4-  >/M))  -  (l/(  /*  4  -/ ni))]  -  al(/M-v/m)/(i'  +  (  /M-  v/m)  /*  + 
/(mM)).  This  has  a  maximum  at  v  =  m  of  o ?( 1/  Vm  -  1/  -/M). 

For  transcendental  functions  like  ln(x)  and  c"  we  can  attack  die  problem  with  an  infinite  series  obtained 
from  the  Taylor  series  expansion  of  the  function  about  v.  when  the  curve  is  relatively  flat  in  the  interval  of 
interest,  the  approximation  will  be  good. 


DM  =  a2[((fl[»')-f(M)))/(i'-M)  -  (( f(*')-H(m))/(^-m))] 

l  .ct  us  expand  the  first  quotient  in  tire  brackets  into  a  scries. 

(fM-fl(M))/(„-M)  =  (f(,.)  -  [fM  +  +  (j-M)2f'M/2!  +  ...]J/(r-M) 

=  -(f(i>)  +  (i»-M)fV)/2!  +  (r>-M)2f»/3!  +  ... ) 

=  -2i=itoooC(-M),,^)/i!] 

Hence 

tyr)  =  a2,= ,  (0  00[[(i'_m)i"1  f*(v )/i!]  -  [<r-M)H  f(^)/i!]l 
We  need  to  take  the  derivative  with  respect  to  v  of  this  in  order  to  sec  if  it  has  a  maximum  in  the  interval.  The 
condition  for  the  maximum  is  thus: 

0=  2i=U0o0I[(*'-m)i'1(*’M),1]f+W(i+l)*(i-l)!)l 
To  approximate  this  we  can  take  the  first  few  terms: 

0  =  (M-m)f'(i<)/2!  +  (2WM-m)-(M  +  m)(M-m))f>)/3! 

0  =  (M-m)[fV)/2  +  (2rm-M)f»/6] 

As  an  example,  consider  f|[x)  =  c\  Then: 

0  =  (M-m)[eV2  +  (2i>-m-M)cV6|  =  (M-m)c*'(l/2  +  p/3  -  m/6  -  M/6) 

which  can  be  solved  iteratively  for  p. 

5.  Quadratic  bounds  on  means:  Tayior-series  methods 

5.1 .  The  problem 

A  straight  line  is  not  a  very  good  approximation  to  a  function  with  a  strong  curvature.  An  obvious  next 
step  to  improve  our  estimates  of  the  mean  is  to  construct  quadratic  bounds  lines  of  the  form  y  =  ax2  +  bx+c 
and  compute  the  mean  along  those: 

E[ax2  +  bx  t-c]  =  a(cr2  +  /i2)  +  tyt  +  c 

However,  finding  quadratic  bounds  curves  is  not  as  easy  as  it  might  seem.  We  generally  cannot  just  use  the 
Taylor  series  about  some  point  of  the  curve,  as  with  the  estimates  (not  bounds)  of  [9],  because  while  such 
approximations  may  stay  close  to  the  curve  of  the  actual  function  on  some  range,  they  may  be  above  and 
below  it  at  different  places,  f  or  instance,  lake  the  3-tcrm  Taylor  series  for  l(x)  =  ln(x)  about  x=  1,  which  is 
0  +  (x-l)*(l/l)  +  ( x- 1  )2*( -1/1 2)/2  =  -,5x2  +  2x  - 1.5 

At  x  =  2  this  is  .5,  below  the  logarithm  curve  value  ln(2)  =  .69.  but  at  x  =  .5  this  is  -.625,  above  the  logarithm 
curve  value  !n(.5)  =  -.69.  Hence  the  approximation  curve  crosses  ln(x),  and  cannot  be  used  as  a  bound  on  the 
values  of  the  latter. 
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5-2.  Quadratic  bounding  by  vertical  shifting 

There  is  a  way  we  can  use  arbitrary  polynomial  approximations  to  get  bounds:  we  can  shift  the 
approximation  curve  upwards  or  downwards  until  it  no  longer  crosses  the  target  curve  in  the  interval.  To  put 
this  formally  for  the  Taylor  scries,  we  want  to  bound  f(x)  on  the  interval  m  to  M  by  the  function 
h(x)  =  (U)  +  (x-t)f(t)  +  .5f(t)(x-t)2  +  K 

where  t  is  some  arbitrary  point  in  the  interval,  and  K  is  some  constant.  If  we  choose  t=  /x  (for  quadratic 

bounds  a  convenient,  but  not  necessarily  best-bound  point),  then  the  mean  of  the  approximation  function  is 

F.[h(x)]  =  l\ti)  +  (M-p)f(t)  +  .5[a2  +  ,i2  -  2/i2  +  pt2)  f  (t)  +  K 
=  ff/i)  +  5a2f'(t)  +  K 

If  we  do  not  choose  t=  p  the  formula  is  slightly  more  complicated: 
f(t)  +  Qrt)f(t)  +  .5(a2  +  (gi-t)2)f'(0  +  K 

Note  for  the  particular  function  f(x)  =  x2  die  Taylor  scries  has  only  three  terms,  and  hence  an  exact  formula 
for  the  mean  of  the  square  of  a  set  of  data  values  is 
fi2  +  ,5<j2(2)  =  p2  4  a2 

'lltc  lower  and  upper  bounds  arc  then  found  from  substituting  K(J  and  K,,  which  are  respectively  the 
maximum  and  minimum  values  in  the  interval  of  study  of  the  error  of  the  approximation  c(x),  defined  as 
c(x)  =  fl(x)  -  fit)  -  (x-O)f(t)  -  .5(x-t)2f'(t) 

Since  the  interval  is  finite,  we  cannot  just  find  die  zeros  of  the  derivative  of  c(x).  /.eras  have  to  lie  within  the 

data- value  interval,  and  they  must  be  compared  to  two  other  points,  the  function  values  at  the  maximum  and 

minimum  of  the  range.  In  other  words: 

K(  is  max|c(m).c(M),  c(z(),  c(/?), ...] 

K,  is  min(c(m),e(M),  c(/ } ),  c(/j), ...] 

where  the  z) are  all  zeros  ofe'(x)  within  the  interval.  To  find  the  zeros: 

dc/dx  =  f(x)  -  l'(t)  -(x-t)f"(t)  =  0 
[f(x)-f(t)|/(x-t)  =  f(t) 

We  always  know  one  solution  of  the  above  equation,  x  — t,  because 
(f(t)  -  f(t)j  -  (t-t)  f'(t)  -  0 

Hut  dierc  arc  no  other  solutions  for  functions  with  constant-sign  derivatives,  implying  no  other  local  maxima 
or  minima  for  a  Taylor-series  approximation.  To  see  diis,  note  the  equation  says  the  slope  of  f'(x)  from  t  to 
some  other  point  must  be  equal  to  the  derivative  of  f(x)  at  t.  Hut  this  cannot  occur  if  the  second  derivative  of 
f(x)  (i.c.,  f"(x))  is  constant  in  sign,  because  dicn  each  value  of  the  first  derivative  (i.c„  f'(x))can  occur  at  most 
once. 

Hence  wc  can  write  the  'Taylor-series  quadratic  bound  in  general  as  (noting  c(p)-=0): 
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upper  bound:  f(t)  +  (ju-t)f(t)  +  .5(<r +(/i-t)2)f"(t)  +  max(c(m).c(M),0) 
lower  bound:  f?t)  +  (/i-t)f(t)  +  .5(<r2  +  (/i-t)2)f"(t)  +  min(c(m),c(M),0) 

For  particular  functions  f  we  may  be  able  to  rule  out  some  possibilities  for  tire  min  and  max.  For  instance, 

for  f(x)=  x3,  e(x)  is  just  the  fourth  Taylor-scries  term,  (x-t)'*6/6,  so  e(M)  >  0  and  e(m)  <  0,  and  bounds  arc 

upper  bound:  t3  +  (/x-t)*3t2  +  .5(<r +(/x-t)2)*6t  +  (M-t)3  =  3t?(M-/i)  +  3t[a2  +  /i2-M2]  +  M3 
'lower  bound:  t3  4-  (jut-t)*3t2  4-  ,5(<r  l- (/x-t)2)*6t  +  (m-t)3  =  3t2(m-/x)  4-  3t[o2  +  ji2-m2J  T  m3 

Similarly,  e(m)<0  from  analyzing  die  Taylor  series  for  logarithm  and  square  root;  (Xc(rn)  for  reciprocal;  and 

(Ke(M)  for  antilog. 

5.3.  An  example 

To  illustrate,  use  our  previous  example  of  f  =  In,  m=  10.  M  =  100,  t  =  /ri  =  23,  and  cr  =  10.  Take  the  Taylor 

series  about  ft.  From  the  preceding  we  know  that  the  only  possible  extremes  occur  at  m,  M,  and  /x,  so  note: 

e(x)  =  ln(x)  -  [ln(23)  4-  (x-23)/23  -  .5(x-23)3/232] 
c(m)  =  ln(10)  -  [3.14  -  .56  -  .16]  =  2.30  -  2.42  =  -.12  =  KL 
e(/x)  =-  In(23)  -  ln(23)  0 

c(M)  =  ln(100)  -  [3.14  4-  3.35  -  5.6|  =  4.6  -  0.9  =  3.7  =  Ky 
Which  are  the  bounds  offsets  we  have  to  add  to  the  estimate  of  the  mean  of 
ln(23)  -  .5  (02/232  =  3.06 

So  we  estimate  the  mean  of  the  logarithms  is  3.06,  with  an  upper  bound  of  3.06  +  max(-.  12.0,3.7)  —  6.76,  and 
a  lower  bound  of  3.06  4-  min(-.  12,0.3.7)  =  2.94.  The  upper  hound  is  much  worse  than  the  linear  upper 
bound  (3.135),  but  the  lower  bound  is  better  than  the  linear  lower  bound  (2.635). 

5.4.  Choosing  the  optimal  point  for  the  Taylor  series 

The  question  arises  .is  to  the  best  value  of  t  for  getting  an  upper  or  lower  bound.  Analysis  requires  careful 
preconditions,  but  we  can  often  do  something  like  this.  Suppose  that  c(M)  is  the  maximum  value  of  c(x)  on 
the  interval  of  study.  The  estimate  of  the  transformed  mean  from  taking  the  Taylor  scries  about  t  is 
fit)  4-  (M-t)f(t)  4  .5[a2  +  (/i-t)2|f'(t) 

=  (It)  +  (fj.- t)f(t)  +  .5|<r  +  (/i-t)?ir(0|  T  [f(M)  -  f(t)  -  (M-t)f(t)  -  .5(M-t)2f”(t)J 
=  fTM)  +  (/i-M)f'(t)  +  .5[ct' -I- /i',-M2-2/»t -t- 2Mt)]f"(t)| 

We  want  to  minimize  this  maximum  error  with  respect  to  t,  i.e.  we  want: 

0  =  a/3t  [f(M)  +  (/i-M)f'(t)  +  .5[a2  +  ja2-M2-2/it  1  2Mt)jr'(t)j 
0  =  (ft-M)f(t)  +  .5|<r2  +  ji2-M2-2/it  +  2Mt)]f"'(t)|  -I-  (M-juJf'd) 

0  =  .5[<r24-/i?-M2-2/tt  +  2Mt)|f  "(t) 


For  a  function  with  derivatives  constant  in  sign,  this  can  only  be  zero  if  the  expression  in  brackets  is  zero: 
0  =  o2  +  /x2- MJ-2/tt  l-2Mt 
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t  =  [02  +  /x2-M  2]/2(p-M) 
t  =  +  M  -  5m]  /  2,  whore  6^  =  a2/(M-p) 

Hence  substituting  back  in  the  expression  for  the  bound,  the  second  derivative  term  must  disappear,  and  we 
get 

fl(M)  +  (ft-M)f((/i  +  M-5M)/2) 
which  is  an  upper  bound  prov  ided  e(\1)>0  and  e(M)>e(m). 

l)y  similar  analysis  we  can  show  that 

t  =  [jli  +  m  4-  6ml  /  2,  where  Sm  =  <x2/(p-m) 

is  the  best  t  for  obtaining  the  other  bound  on  the  e(x)  on  the  interval  of  interest,  leading  to  a  lower  bound  of 
ffm)  +  (/i-m)f((/x  +  m  +  8ni)/2) 

provided  c(m)<0  and  c(m)<ci\1).  For  <j  =  0  the  upper  and  lower  bounds  occur  at  t  =  (p-t-M)/2  and 
t  =  (p  +  m)/2  respectively;  and  for  a  the  maximimin||(M-p)(p-m)|.  Lhcse  arc  both  (M-m)/2. 

So  for  the  logarithm  function  (where  c(m)<0  necessarily)  ;i  =  23.  m  =  10,  and  M  =  100,  and  this  gives  for  a 
lower  bound  for  t  =  (23  +  10  +  .5*  100/(23- 1()))/2  =  20.3.  and  the  bound  is 
f(m)  +  (/x-m)f'(20.3)  -  ln(10)  +  13/20.3  ^  2.30  +  .640  =  2.94 
which  is  negligibly  better  than  for  the  series  about  p.  but  may  represent  an  improvement  in  other  cases.  In 
general,  the  Taylor  scries  approach  works  well  for  narrow  intervals  of  interest  or  intervals  where  ti(x)  is  rather 
flat.  We  can,  however,  use  order  statistics  to  improve  l  aylor-series  bounds;  sec  section  10. 

6.  Quadratic  bounds  on  means  from  Lagrange  interpolation 

Taylor  scries  approximation  ,  deteriorate  on  the  edges  of  an  approximation  interval.  We  arc  more 

concerned  with  signed  maximum  deviation  of  the  approximation  from  the  function  (a  concept  distinct  from 

the  l.jjg  approximation,  which  minimi/es  the  absolute  value  of  deviations),  and  a  better  quadratic  for  our 

purposes  comes  from  Lagrange  interpolation  method  using  the  Chcbyshcv  interpolation  points.  For  a 

quadratic  we  need  three  points  to  fit  the  curve  through,  giving: 

h(x)  =  f(p)(x-q)(x-r)/(p-q)(p-r)  \  lT|)(\-p)(x-r)/(q-p)(q-r)  +  f(r)(x-p)(x-q)/(r-p)(r-q) 

h(x)  =  (8/3(M-m)‘)|rtp)(x-q)(\-r)  -  7f(q)(\-p)(x-r)  +  l(i  )(x-p)(x-q)] 

where  p  =  m  +  (.5-  /3/4)(\1-m).  q  -  (M  t  m)/2.andr  ■-  m  +  (.5+  v/3/4)(M-m) 

Using  our  example  of  f=  In,  m  -  10,  M  100,  p  -  23,  and  o~ 10,  we  have: 

p  =  16.029,  q  =  55.0,  r  =  93.97 1 ;  |n(pU  2.7744. In(q)  =  4.0073.  In(r)  =  4.5430 
h(x)  =  -  0002295X2  I  ,04794x  -t  2,0648 

Hence  an  estimate  of  the  mean  ol  the  logarithms  for  this  example  is 

-,0002.795(  102  +  232)  +  .04704(23)  +  3.0648  -.1444  +  1.1026  +  2.0648  =  33n0 
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This  is  an  estimate,  not  a  bound.  J list  as  with  Taylor-series  polynomials,  we  can  get  bounds  from  this  front 

knowing  die  extrema  (maxima  and  minima)  of  the  error  curve  on  the  interval  of  interest.  Tor  Chebyshev  (as 

opposed  to  faylor-scries)  approximations  there  are  two  places  in  the  interval  where  c(x)  =  0,  and  hence  one 

local  maximum  and  one  local  minimum.  We  can  find  these  by  solving  the  error  curve  derivative  explicitly;, 

for  logaridnn  and  cube  this  is  a  quadratic  equation,  for  square  root  and  reciprocal  a  cubic,  and  for  exponential 

a  transcendental  equation.  Tor  example,  for  our  ln(x)  example: 

d/dx[ln(x)  -  (-.0002295x2  +  ,04794x  +  2.0648))  =  l/x  +  ,<)00459x  -  .04794  =  0 
hence  .000459.x2  -  .04794x  +1^0 

and  x  =  [.4794  ±  v/<.04794\l)()1836)|  /  .000918  -  28.80  and  75.64 
So  the  extrema  of  c(x)  on  the  interval  can  occur  at  only  four  points:  in- 10,  M  =  100,  28.80,  and  75.64. 
Computing  c(x)  there: 

e(  10)  =  -.2187.  c(  100)  =  .04137.  e(28.80)  -  .10526.  e(75.64)  =  -.05193 

And  hence  the  l.agrange-Chebyshev  quadratic  bounds  on  the  mean  of  the  transformed  values  are: 

upper  bound:  3.0230  4  max(-.2187.  .04 1 37.  .10526.  -.05193)  --=  3.1283 
lower  bound:  3.0230  +  mm(-.21S7.  .04137.  .10526.  -.05193)  -  2.8043 

which  arc  better  than  die  linear  bounds  of  3.135  and  2.635  (and  hence  the  I  aylor  series  bounds  too). 

7.  Quadratic  bounds  on  means;  one-sided  methods 

There  are  quadratic  methods  that  avoid  having  to  lind  the  extrema  of  the  error  function  in  computing  an 
approximation,  by  constructing  approximation  curves  entirely  above  or  entirely  below  die  target  function  in 
the  interval.  We  can  do  this  if  we  can  position  the  points  of  intersection  of  the  approximation  curve  ax2  4-  bx 
+  c  with  flx)  to  lie  either  (<i)  outside  the  interval,  or  (b)  tangent  at  some  point.  Among  our  six  demonstration 
functions,  reciprocal  and  cube  lead  to  cubic  polynomial  equations. 

7.1.  Intersection  and  tangent  positioning:  reciprocal 

Consider  reciprocal  first,  flic  error  curve  is 
c(x)  -  l/x  -  ax2  -  bx  -  c 

and  it  can  have  at  most  three  /cros  which  arc  the  solutions  to 
0  -  a.x '  4-  bx2  f  cx  -  1 

To  keep  the  approximation  curve  "close",  wo  can  put  a  point  of  tangcncy  at  some  t  inside  the  interval  -  i.c.,  a 

double  zero  at  t  --  and  another  /cm  at  VI.  Wo  can  write  this  function  as  c(x)  -  (x/t  -  l)2(x/M  -  1),  which 

approaches  -oo  lor  small  x.  +  co  for  large  \,  reaches  a  local  maximum  at  x=  t.  a  local  minimum  at  some  larger 

x  value,  and  then  crosses  /ero  permanently  at  x  M.  I  hen  we  want 

(x/t  -  1 )( x/t  -  l)<x/\1  ■  1)  ax'  t-  bx'  t  ex  -  1 

xVvi  -  x  (2/t\l  4  I / i  )  I  \( 2/t  t  I/M)  1  .  ax'  4  bx2  f  cx  -  I 

a  =  l/t\l.  b  -r  -( 2/t M  4  1/r ). e  -  2/t  +  I/M 
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So  the  quadratic  lower  bound  on  the  mean  is 

(a2  +  p2)/l2M  -  (2/tM  t  l/l?)p  4  2/1  +  1/M 

We  arc  interested  in  the  best  lower  bound  possible,  i.e.  the  largest.  Wc  can  lind  this  by  setting  to  zero  the 
partial  derivative  of  the  preceding  with  respect  to  t; 

0  =  -2(a2  +  /x2)/t,M  +  (2/cM  +  2/tV'  2/t2 

0  —  -((t  9"  p~  )/M  4-  (t/M  •  I  )p  t 
t  =  [p-(<J~  -t-p )/ N1  ]/( 1  •  p/M) 

=  p  -  =  /i  -  <5^  where  o  /( M-p) 

So  for  a=0  this  is  p;  for  a  a  maximum,  namely  v/|( M )(/i-m)]  (sec  section  3.6),  this  is  m.  We  saw  this  6M 
term  before  in  a  different  kind  of  quadratic  approximation  in  section  5,4. 

Substituting  this  t  in  the  bound  formula,  we  get  a  quadratic  lower  bound  of 

[(a2  +  p'’-Mp)  +  2(fi-<5M)(M-/i)  i  (/i-cSNl>  |  /  Mlp-S^)2 
=  1/M  -f  [( a  +  p'  -M/t)  1-  2(  Mp-p  rr  )|  /  M((p  •  a'  /( M-p))‘  ] 

=  1/M  +  |Mp  -  a  -  p  )  /  M|(  Wji  -  a  -  p'  )  /  (M  -  p)2 

=  1/M  +  [( M  -  p)  /  M(Mp  •  a  -  p  )] 

=  (1/M)  [  |Mp  -  a‘  -  p  t  M  -  ?Mp  t  p  |  /  (Mp  -  a  -  p2)] 

=  (1/M)  [M2  -  M/t  -  a  |  /  |Mp  ■  /c  -  <r| 

=  (1/M)  (M  -  6m)/(,  t  -  6m)‘ 

Note  that  when  a  =  0  this  is  equal  to  1/M  *  \1  /  /t  -  l/p,  the  linear  bound.  Since  p<M.  a  nonzero  a 
will  cause  the  denominator  of  the  traction  to  decrease  proportionately  more  than  the  denominator,  and  hence 
give  a  lower  bound  greater  (better)  than  the  linear  lower  bound.  I  he  maximum  value  of  ct  is  /[(M-p)(p-m)]. 
whereupon  S^-p-m,  and  the  lower  bound  is  1/M  *  [M  -  p  t-  in]  /  m  -  1/m  4-  1/M  -  p/mM,  exactly  the 
upper  linear  bound  for  reciprocal  (sec  section  .1.2). 

Again,  let's  use  our  standard  example  of  in  =10,  M=100,  p  =  23,  cr=10,  tliis  time  for  the  reciprocal 
function.  Then 

SM  =  102/(  100-23)  =  1.299 
And  a  lower  bound  on  the  mean  of  the  reciprocals  is 
1/100  *(100  -  1.299)/  (23-  1.299)  -  .015  (8 
I  his  is  better  than  the  linear  lower  hound,  cult. uluted  as  1/p  a  .0435. 

Wc  can  get  an  upper  quadratic  bound  by  only  minor  modifications:  just  create  a  bounding  curve  that 
crosses  1/x  at  m  instead  of  M,  and  is  tangent  at  l  in  the  interval.  Wc  just  substitute  m  for  M  in  the  preceding 
formulae,  giving 

an  upper  bound  of  (<t?  i  p  )/tm-(2/un  I  I/t  )p  t  2/t  I  1/m 
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taken  at  t  =  p  f  <r/(p-m)  =  /a  +  8|n 

which  can  be  written  as  (1/m)  (m  +  Sm)/(p  t  8m)  where  5m  =  <r2/(p- m) 

So  for  our  example  data,  t  =  23  +  10?/(23-10)  -  30.69.  and  the  upper  bound  is  1/10  -  13/10*30.69  =  .0576. 
This  is  significantly  better  than  the  linear  upper  bound  of  (77/90)*. 1  +  (13/90)*. 01  =  .0871.  Hence  by  using 
a  quadratic  rather  than  linear  bound  we  have  narrowed  the  range  of  the  answer  by  a  factor  of 
(.0576-.0455)/(  .0871 -.0435)  =  .278. 

7.2.  Evaluation  of  the  quadratic  reciprocal  bounds 

We  can  obtain  useful  approximations  of  the  quadratic  bounds  by  replacing  the  quotient  with  the  first  few 
terms  of  its  binomial  expansion,  as  here  for  the  lower  bound: 

(p.-Sx1)  ~  l/(l  +  5Nj/p  +  3  2  2 

hence  1/M  (M  -SM)(p-8M)‘  ~  1/p  +  (1/p-  -  l/Mp)SM  +  (1/p  -  1/Mp2)8^ 

=  l/'p  +  5M(l/p-l/M)/p  t  S^l/p  l/MJ/p3 
=  1/p  +  a?/Mp2  +  a4/(M-p)Mp' 

Hence  the  difference  between  the  quadratic  bounds  can  be  approximated  by 

(1/m  -  1/M)a7p‘  f  (l/m(m-p)-  l/M(M-p))a4/p4 
-  |(M-m)o2/p  1  |l/mM  +  (m  -t- M-p)(r/pmM(m-p)(M-p)| 

As  suggested  in  the  previous  section,  the  quadratic  bounds  are  always  better  than  the  linear  bounds  except  at 

the  two  extreme  cases  of  a.  We  can  find  the  p  and  a  for  which  they  arc  least  accurate.  Set  the  partial 

derivative  of  the  difference  betw  een  the  quadratic  bounds  to  0: 

0  =  3/9p  [( 1/m  -  l/M)cr/p  4  (!/m(m-/i)  -  l/M(M-p))a4/p*] 

0  =  -2(l/m  -  1  /M )oVp !  4  |l/m(m-p)‘  -  1/MtM  p)'’]cr4/pJ 
4-  -3[l/m(m-p)  -  l/M(M-p)|o'/p4 

2(  1  /in  -  1/M)  -  [l/m(m-p)  -  l/M(M-p)  |a‘  -  3|  l/m(m-p)  -  l/M(M-p)]a?'/p 
which  can  be  solved  iteratively. 

7.3.  Intersection  and  tangent  positioning:  cube 

We  can  do  something  similar  for  the  cube  function: 
e(x)  =  x4  -  ax2  -  hx  -  e 

which  is  a  third-degree  polynomial  just  like  the  one  for  reciprocal.  So  we  can  position  one  intersection  point 
and  one  tangcncy  point  This  time  we  can  write  c(x)  as 
e(x)  =  (x-t)?(x-M)  -  x'  -  ax?  -  bx  -  c 

hence 

a  =  2t  +  M.  I>  (rt?(M).c  f\l 


so  an  upper  hound  on  the  mean  is 
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(2t+  M)(<t2  +  /h  )  -  (t“4  2tM)/x  4-  rM 
and  this  is  a  minimum  when  we  choose  a  t  such  that 

2(<r+/i2)  -(2t+2M).i  l  2tM  -  0 
[(o2+/x2)  -  M/i]  /  (/i-M)  =  t 
t  =  /li  -  =  lL ' 

Substituting  this  in  die  equation  for  die  bound: 

'  +  2(/i  <SM)(rr  4/i  -M/i)  4  + 

=  /i'M  -  n*  -  2m8mM  4-  2,i15m  +  -  Sxl\i  +  2,t a2  +  2,t3  - 

2,a'M  -  2(^,0"  -  2«M,r  f  25,,, tM  l  mM  4- 
=  fi*  4-  6  \M-jlO  +  (2,t 4  \1-26M)a? 

=  ,i 1  4-  a  /(M-,t)  f  (2,i  4  M  -2a  /(M -^))a3 
=  ,i!  -  a4/(M-^.)  4-  (2,i 4-  M)<rt 
=  H '  4-  (2,i  4-  M  -  5^,)ct" 

Similarly,  a  lower  bound  is 

(2t4- m)(cr  4- ix') -<r  +  2tm),t  4  tun 
and  this  is  a  maximum  when  we  choose  a  t  such  that 
t  -  ,i  4-  crVl^-m)  =  ,i  4-  sm 
leading  to  a  lower  bound  of 

fa3  4-  (2,i  4-  m  4-  8m)(7? 

Note  the  quadratic  lower  bound  is  always  greater  than  the  linear  lower  bound,  p.3.  The  difference  between 
die  upper  and  lower  bounds  is 
[M  -  m  -  SM  -  SJo? 

which  provides  a  useful  cruet  ton  for  the  effectiveness  of  these  bounds.  Note  this  is  always  nonnegative  since 

M  -  m  *  [8N)  4  <SnJ  -  M  m  -a  (\l-m)/(M-,t)(^-m) 

=  (M-m)|l  -  o2/( M-,i)(,i-m)| 

The  largest  possible  value  of  o'  is  (M-,/)(,r  in),  so  die  quantity  in  brackets  is  always  nonncgativc. 


8.  Optimal  quadratic  bounds 

The  problem  of  finding  die  best  quadratic  approximation  for  our  bounding  purposes  may  be  viewed  as  an 

optimization  problem  m  two  ..m. tides.  Since  the  quadratic  curve  ax'  4  bx  4  c  leads  to  a  bound  of 

upper  bound:  at o  t  ,i  )  t  b,i  I  e  1  max  ^ A  ^,|f(x)- 1\  -bx  c) 
lower  hound:  a(o  Mi  )  *  h/r  t  ct  nun  7«,|f(x)ax' -h\-c| 

1  1  tn  -  \v\r  1 

and  die  constant  e  can  be  moved  out  ol  die  maximum  and  minimum,  we  can  w  rite: 

upper  bound:  .1(0  *  ,i  )  ‘  hu  I  max  x,|f(\)-ax  bx) 

lower  bound:  a(o  t  ,i  )  t  b,i  I  mm^  N1|l(x)-ax'  bx) 

So  we  have  two  optimization  problems  loi  real  a  and  b;  to  lind  the  values  that  minimize  die  upper  bound, 


^ - 
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and  die  values  the  maximize  the  lower  bound.  We  have  constructed  a  program  that  docs  this  by  estimating 
die  gradient  from  exploratory  steps,  finding  die  zeros  of  the  error  function  by  the  quadratic  formula  for 
logarithm  and  cube,  and  by  iterative  bisection  for  antilog,  square  root,  and  reciprocal.  Comparison  with  the 
other  obtained  bounds  is  presented  later  in  diis  paper.  Unfortunate  ly,  die  extrema  appear  to  be  "broad",  and 
convergence  is  slow,  so  the  other  methods  discussed  in  this  paper  seem  clearly  desirable  in  most  cases.  While 
these  other  methods  cannot  usually  get  the  tightest  bounds,  die  difference  is  usually  not  much. 

A  strong  local  maximum  found  by  the  optimization  process  is  guaranteed  to  be  the  global  maximum  over 
all  quadratic  curves,  because  die  function  being  optimized  is  convex.  To  sec  this,  note  for  die  upper  bound 
for  instance 

(da(  +{\-0)n^){o'  +  fi')  4-  (0b,  t  (l-tf)b,)ju 

4-  maxm<x<M|li(x)-(0a)  +  (l-tf)a,)x'-(tfb1  -t-(l-0)b?)x| 

<  a1(a2  +  /x1)i+  b,|*  4-  rnaxm<v^M[f(x)  a1x:’-b1x] 

+  a 2(ct2+/i  )  4-  bqa  4-  max^x<N1[f(x)-a,\' -i>2x] 

since  max(flx)4-g(x))  <  max(ftx))  4  ma\(g(x)). 

For  our  standard  example,  we  found  the  optimal  quadratic  bounds  to  be  .1.00  and  .1.10. 

9.  Improving  accuracy  with  outliers  and  statistics  on  subsets 

We  can  tighten  bounds  if  we  know  additional  information  about  a  set  of  data  values.  We  may  know  a  few 
extreme  values  on  the  range  (outliers),  and  be  able  to  remove  diese  points  from  the  analysis  of  die  rest  of  the 
points.  This  helps  a  good  deal  when  m  and/or  M  are  unusually  unrepresentative  of  the  distribute  >  (and 
notice  how  frequently  we  have  used  m  and  M  in  our  formulas).  With  die  outliers  removed,  the  remaining 
values  can  have  a  narrower  range,  on  which  the  function  can  be  better  matched  by  a  linear  or  quadratic 
approximation.  The  transformed  values  for  the  known  outliers  can  then  be  added  to  the  total  mean  or  total 
variance  in  a  final  step. 

But  we  can  generalize  this.  We  can  improve  accuracy  of  bounds  any  time  we  know  means  and  variances  of 
arbitrary  subsets  of  the  original  data  values.  We  may  then  estimate  statistics  on  the  transformed  values  for 
each  subset  and  combine  them  with  the  appropriate  weighting. 

9. 1 .  An  example 

For  instance,  from  |Kj.  there  were  61. VI  merchant  ships  with  United  Slates  rcgistiy  in  1982,  of  an  average 
gross  tonnage  of  1120  per  ship.  Of  these.  2941  were  fishing  vessels,  of  average  tonnage  1 9*1.6  gmss  tons;  548 
were  cargo  ships,  of  average  tonnage  9790  tons;  .161  were  I. inkers,  of  average  tonnage  2670  tons.  I  fence  there 
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were  6133  -  2941  -  548  -  361  =  2283  other  ships  of  average  tonnage  [(6135*3120)  -  (2941*200)  -  (548*9740) 
-(361*2670)1/2283  =  [19.130.000  -  588.00  -  5.340.000  -  965.000|  /  2283  =  5320  tons. 

Now  suppose  we  want  the  mean  of  the  logarithms  of  the  tonnage  values.  Consider  the  upper  bounds  on 
each  of  die  four  disjoint  subsets.  These  are  just  the  logarithms  of  the  means,  or  5.30,  9.21,  7.88.  and  8.57. 
Hence  die  total  upper  bound  is  the  weighted  mean  of  these  upper  bounds,  or 
[(5. 30*294 1)4  (9. 21*548) +  (7.88*361) +  (8.57*2283)1  /  6133  =  7.018.  This  should  be  compared  with  the 
upper  bound  derived  from  the  mean  of  die  entire  set.  ln(3 1 20)  =  8.03.  so  the  subdivision  data  gave  us  a 
significant  improvement. 

Unfortunately,  we  do  not  know  anything  about  the  maximum  and  minimum  tonnage  of  classes  of  ships,  so 
we  cannot  get  a  cumulative  lower  hound.  However,  we  know  m=  100  for  diis  table,  and  M  =  200,000  is  a 
reasonable  figure  from  knowledge  of  merchant  shipping,  so  a  global  lower  bound  is  found  by 
a  -  (3120-1  (X))/( 200000- 100)  -  .0151 

lower  bound  is  ln(  100)  +  <»(  1  n( 200000)- 1 n(  1 00))  =  4.60  +  .0151*7.60 
=  4.60  +  .115  =  4.715 


9.2.  Proof  of  desirability  of  subdivision  for  (inear  bounds 

It  can  be  proved  that  linear  bounds  on  the  mean  are  never  worsened  by  using  such  subset  statistics.  'This 
can  be  seen  graphically  in  figure  9-1.  We  consider  here  the  case  of  binary  subdivision,  and  further 
subdivisions  can  be  covered  by  extension.  We  also  consider  only  functions  concav  e  downwards,  but  the  other 
case  can  be  handled  analogously. 

First  consider  the  lower  bound.  If  the  ranges  of  the  subdivisions  are  the  same  as  the  full  set,  then  the  two 
lower  bounds  must  lie  along  the  same  line,  and  their  weighted  average  must  lie  along  the  line  too;  hence  the 
lower  bound  of  the  full  set  is  exactly  die  weighted  average  of  die  two  lower  bounds.  If  one  or  both  of  the 
subsets  has  a  narrower  range  of  values  than  the  full  set.  this  can  only  increase  (improve)  die  lower  bound  since 
a  secant  across  a  subrange  lies  fully  above  a  secant  across  a  range  containing  the  subrange.  Hcncc  the  lower 
bound  cannot  get  any  worse  in  this  subdiv  ision  summation  of  linear  lower  bounds. 

The  upper  bound  also  cannot  be  any  worse.  'This  time  range  reduction  within  a  subset  does  not  matter 
because  the  upper  bound  is  constrained  to  lie  along  the  curve  of  the  function,  which  is  independent  of  where 
it  is  sliced.  Hie  weighted  average  of  the  two  subset  upper  bounds  is  a  point  along  the  line  connecting  two 
points  on  die  function  curve.  But  since  the  function  is  concave  downwards,  this  point  is  always  below  the 
function.  But  since  the  upper  hound  on  the  lull  set  is  constrained  to  lie  on  the  curve,  the  subdivision  process 
always  guarantees  a  better  upper  bound  as  long,  as  the  two  subdivision  means  are  diffcient.  and  no  worse  if 
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Figure  °}-l: 


Improvements  in  linear  hounds  from  combining  statistics  on  two  disjoint  sets 


I 


they  are  not  different. 


10.  Exploiting  order  statistics  as  well 

So  far  we  have  only  assumed  knowledge  of  the  maximum,  minimum,  mean,  and  (sometimes)  standard 
deviation  of  sets  of  data  values.  If  we  have  additional  statistics  on  the  data  values  we  can  do  a  better  job  of 
estimating  statistics  on  the  transformed  values.  In  this  section  we  discuss  using  order  statistics  (e.g.  medians 
and  percentiles).  Order  statistics  have  the  nice  property  that  they  have  one-to-one  mappings  from  the  original 
data  values  to  the  transformed  values  under  the  monotonic  transformations  we  arc  assuming. 

10.1.  Using  the  median 

First,  assume  we  know  a  median  in  addition  to  the  maximum,  minimum,  and  mean.  We  can  often  get  an 
immediate  improvement  in  the  bounds  on  estimates,  l  et  the  error  curve  (linear,  quadratic,  or  whatever)  be 
e(x).  Then  the  median  can  be  thought  to  partition  the  points  into  two  equal -si  zed  subranges  (assume  the 
number  of  points  to  be  large  enough  so  that  even  numbers  of  points  don’t  bother  us).  Then  an  upper  bound 
on  the  mean  of  the  transformed  values  is  the  estimate  given  by  the  approximation  curve  plus  one  half  the 
maximum  of  die  error  curve  in  the  range  to  the  left  of  the  median  plus  one  half  the  maximum  of  the  error 
curve  in  the  range  to  the  right  of  the  median.  The  lower  bound  on  die  mean  is  found  substituting 
"minimum"  for  "maximum"  in  die  above  rule.  Thus  knowing  the  median  decreases  the  influence  of  extrema 
of  the  error  curve. 

10.2.  Other  order  statistics 

We  can  generalize  these  ideas  to  the  situation  where  we  know  arbitrary  order  statistics  on  the  original 
distribution.  Denote  these  statistics  as  r  pairs  of  the  form  <xi,f  >,  where  fraction  f  of  the  items  in  die 
distribution  arc  claimed  to  lie  to  die  left  of  value  x^  T  hen  we  can  generalize  die  formula  of  section  5  as 
follows: 

upper  bound  is  (estimate  from  approximation  curvc>  -  <j<r  [f  *  min^  <x<^  |c(x)] 
lower  bound  is  (estimate  from  approximation  curve>  -  2  If.  ^  .  *  max  [c(x)]| 

'  i  1  I  <i<r  v  ,<\<x  1  11 

-  —  r  I  I 

where  c(x)  is  the  error  curve  a(x)-fl(x).  x()  is  defined  as  m,  with  f0  =  d,  and  (lie  xf  is  defined  as  M  (with 
corresponding  f  of  l).  Thus  die  effects  of  the  extreme  points  of  c(x)  arc  "diluted"  by  their  fractional 
coefficients,  and  the  more  order  statistics  arc  know  n,  die  tightei  the  eventual  hounds. 

Under  certain  circumstances  we  can  simplify  the  above  formulae  considerably.  If  we  know  even- 
subdivision  order  statistics  (i.e.,  f  -  i/r.  r  the  number  of  order  statistics),  and  if  the  error  curve  c(x)  is 
monotonic,  then  the  maximum  and  minimum  of  v(\)  in  each  suhinterval  between  the  order  statistic  ordinates 
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x(  must  lie  at  the  endpoints.  So  if  e(x)  is  monotonic  increasing,  the  upper  bound  is  <i<mc(x,)|/'  nnd  ^*c 
lower  bound  is  [21<i<ine(xi .,)|/r;  and  \ ice  versa  if  c(.x)  is  monotonic  decreasing.  Hence  the  absolute  range 
between  die  upper  bound  and  lower  bound  is  always  the  same  number,  |c(x  )-o(x0)|/r  =  |e(M)-c(m)|/r. 
(Note  that  Taylor-scrics  quadratic  approximations  are  monotonic  if  c(m)<0<c(M)  or  e(M)<(Xe(m),  conditions 
which  occur  frequently.) 

10.3.  Order  statistics  and  the  standard  deviation 

Order  statistics  are  also  helpful  in  estimating  the  standard  deviation  of  the  transformed  values,  especially 
order  statistics  for  the  leftmost  and  rightmost  subranges  of  the  interval.  Recalling  die  bounds  lines  drawn 
through  the  mean  of  die  transformed  values  in  section  4.2,  we  had  to  draw  them  so  they  lay  entirely  above  the 
curve  to  one  side  of  die  mean,  and  entirely  above  on  the  other  side,  and  this  is  a  highly  conservative 
assumption.  Assume  v  is  known  precisely.  We  could  probably  get  a  better  bound  if  we  knew  how  many 
points  lay  to  the  left  of  some  x^  and  the  drew  a  secant  of  fix)  from  the  transform  mean  to  it,  radier  than  from 
the  transform  mean  to  nr,  or  if  we  knew  liow-  many  points  lay  to  die  right  of  some  x  p  and  drew  secant  from 
the  transform  mean  to  it  instead  of  M.  See  figure  10-1. 


The  estimate  of  die  standard  deviation  of  die  transformed  values  obtained  from  these  lines  is  just  their 
slope  times  the  original  standard  deviation.  Hut  to  get  a  bound,  we  need  a  correction  for  the  points  lying  more 
extreme  than  the  new  point  of  intersection.  Consider  the  example  of  curve  concave  downwards  like 
logarithm,  and  take  die  upper  bound  line  from  the  transform  mean  to  some  point  to  the  left;  call  the  point  Xp 
and  let  it  be  an  order  statistic  so  that  fraction  p  of  the  distribution  lies  to  the  left  of  it.  Assume  the  mean  of  the 
transformed  values  is  known  exactly.  Then  the  correction  for  a  bound  corresponds  to  the  situation  where  all 
the  p  points  arc  at  m,  which  means  a  difference  in  the  variance  of 
p*[(ff»-)-f(m))?  -  [(i'-to)*(f(i')-f(x,))/(i'-x|)|‘>| 

where  is  the  number  which  maps  functionally  to  the  mean  of  the  transformed  values.  Hence  the  expression 
for  the  upper  bound  on  the  standard  deviation  is 

[[<r2  +  (|*-*')2ff(flr)-fl[xt))/Car-xi)|7  +  p*(ftr')-f(m))’  -  p*|(r-m)*(f(«')-f(x]))/(»'-x|)]2] 5 


So  using  such  a  bounds  line  can  give  a  hetter  slope,  but  one  pays  a  penalty  of  a  correction  term  which 
subtracts  from  the  slope  improvement.  An  obvious  question  is  under  what  conditions  use  of  the  order  statistic 
helps.  It  turns  out  this  has  a  surprising  answer  when  v  is  known  exactly.  Denote  the  two  slopes  as  s  and  SQ, 
i.c. 


sm  =  W*')  ftnOVr-m).  s()  ( IT */)  fT x ( ))/( i»  x() 

we  can  rewrite  our  expression  for  the  upper  bound  as 


|<r  T(fi-i')  |  *  v  f  p*smV-m)-  -  pV’V-mrl 


.M 
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This  will  represent  an  improvement  on  the  linear  upper  hound  [a2  +  (/i-*')2)s2(i  if 

(o'  +(p -*')■' |s;'  >  |o'  +  (/»-*')  ]  *  s2  +  pS2  V  m)2  -  p*s2*(i'-m)2 
or[o'  +(/i-i/)  ||sm  -  soJ  >  p*|sm  -  so)*(«-m)- 


So  the  slope  terms  cancel,  and  use  of  the  order  statistic  <xrp>  is  going  to  be  helpful  when: 

|tj2  +  (/i-i»);,]>p*(i'-m)2 
'  or  p  <  [(a2  +  (/i-i>)2)/(i»-m)2| 

This  result  is  independent  of  where  the  order  statistic  is  within  the  distribution  (xt),  and  depends  only  on  the 
standard  deviation  and  minimum  of  the  original  distribution,  and  the  mean  of  the  transformed  values.  The 
corresponding  result  for  the  rightmost  order  statistic  is 
p  <  ((ff2  +  (fi-j>)2)/(M-i»)2| 

where  p  is  the  fraction  of  items  to  the  right  of x 


If  we  know  other  order  statistics  than  just  the  leftmost  and  rightmost  (Xj  and  xf  t)  we  can  get  better  bounds, 
though  predicting  the  improvement  is  difficult.  For  instance,  if  we  know  x^,  we  can  take  a  line  from  v  to  x2, 
and  estimate  the  contribution  to  the  correction  factor  from  the  items  between  x(  and  x^  differently  than  the 
contribution  of  items  between  m  and  x(. 


10.4.  Adjustment  of  standard  deviation  for  an  inexact  transform  mean 

If  we  do  not  know  the  exact  mean  of  the  transformed  values,  <p  =  fit?),  we  must  adjust  these  results.  Let 

the  bounds  on  the  transform  mean  be  j<(  and  as  in  section  4.3.  Assume  l\x)  has  a  negative  second 

derivative.  The  formula  for  the  upper  bound  is 

[[a2  +  (p-^):’l[(at<)-f(xi))/(,-xi)]'1  +  p*((jv)-f(m))2  -  p*|(»>-ni)*(f(i')-l\x1))/(^-x1)]2j5 

Since  {[<r‘  +(/n-r')i )  is  monotonically  decreasing  with  in  its  range.  The  rest  of  the  expression  is  die 

difference  of  a  term  and  the  difference  of  two  others.  The  first  term  is  monotonically  decreasing  widi 

increasing  v  since  the  second  derivative  of  die  curve  is  negative.  This  represents  the  second  moment  of  fj 

items  grouped  at  m  on  the  curve.  As  v  increases,  the  possible  distance  these  items  could  be  off  the  bound  line 

increases,  and  their  relative  weight  increases  as  H>)  becomes  relatively  larger  than  Ifm).  Hence  since  dais 

correction  term  is  subtracted  from  the  slope,  the  effect  as  i>  increases  will  be  for  all  the  terms  to  decrease. 

Hence  the  adjusted  value  for  the  upper  bound  on  the  standard  dev  iation  of  die  transform  values  is  just 

lk2+(f*-*'1  )2|[(R*'|  Klx,))/!.', -x,)l2  +  p*(tu,  )-f(m))2 
'  P*l(^!  ■m)*(f(i'1j-fl(x t ))/(»',  -x,)|2| 5 

substituting  for  v  in  the  exact-*'  formula. 


Similarly,  we  substitute  p  for  t>  to  get  an  adjusted  lower  bound.  Analogously,  we  handle  curves  with  a 
positive  second  derivative  by  substituting  i'(  for  v  for  an  upper  bound,  *y  for  i>  for  a  lower  bound. 
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10.5.  Quasi-order  statistics  from  the  standard  deviation 

If  we  know  the  mean  and  standard  deviation  of  a  set  of  data  values,  we  can  use  Chebyshcv’s  inequality  to 
bound  the  number  of  items  lying  more  than  a  certain  distance  from  die  mean.  This  information  is  like  an 
order  statistic,  but  since  it  only  represents  an  upper  hound  on  the  number  of  items  in  a  region  and  not  an 
exact  number  of  items,  it  must  be  used  carefully.  It  can  only  be  used  for  partitions  of  die  interval  of  interest 
into  two  parts,  the  subinterval  of  points  farther  than  a  certain  distance  to  die  left  (or  right)  of  the  mean,  and  a 
subinterval  of  all  other  points  of  die  interval.  It  can  also  only  be  used  for  an  upper  bound  on  the  mean  of  the 
transformed  values,  given  fi ,  when  c(x)  has  a  maximum  on  the  first  subinterval  that  is  more  than  the 
maximum  on  die  second,  or  for  a  lower  bound  when  e(x)  has  a  minimum  on  the  first  subintcrval  that  is  less 
than  the  minimum  on  the  second. 


Actually,  Chebyshcv’s  inequality  in  die  standard  form  (that  only  a  fraction  ct2/ D2  of  the  points  of  a 
distribution  can  lie  greater  than  distance  1)  units  from  the  mean)  is  not  the  best  inequality  we  can  get,  since  it 
refers  to  both  tails  of  a  distribution,  and  we  arc  only  concerned  w  ith  the  number  of  points  in  one  tail.  Only 
ct2/(ct2  +  D2)  points  can  lie  to  the  left  of  a  point  1)  to  die  left  of  the  mean,  or  lie  to  die  right  of  a  point  D  to  the 
right  of  die  mean.  To  see  this,  note  diut  if  fraction  f  of  the  points  lie  to  the  left  of  a  point  I)  units  to  the  left  of 
the  mean,  then  their  weighted  second  moment  about  the  mean  is  at  least  ID2,  which  must  be  less  dian  a2.  But 
in  order  for  die  mean  to  be  at  die  place  it  is.  this  fraction  f  of  the  points  must  be  compensated  for  by  (1-0 
points  R  units  to  the  other  side  of  the  mean.  Tor  maximal  f,  these  other  (1-0  points  must  all  be  at  the  same 
location,  for  otherwise  they  would  have  a  non/cro  variance  which  plus  their  mean  would  add  to  the  variance 
of  the  whole  distribution,  and  would  require  a  lower  maximum  f.  Hence  we  have  two  equations  to  solve 
simultaneously: 

fD2  +  (l-l)R2  =  ct2 
fl)-(l-0R  -  0 

which  imply 

R  =  Df/(l-0,  fl>2/(l-0  =  ct2.  f  =  ct2/(ct2  +  D2) 


Using  this  result,  we  then  can  put  bounds  on  the  mean  of  the  transformed  values  of 

upper  bound:  f( /x)  +  ,5CT2f"(ju) 

+  (o2/(ct24  n?)*max  ,  (e(x)) 

+  (D2/(CT2+n2)*max(iY)^<.M(c(x)), 
provided  die  first  max  value  is  greater  than  the  second 

lower  bound:  4-  ,5o2f"(/r) 

+  (CT2/(CT2+l)Tminm<x<(l,)(c(x)) 

+  (!)  /(CT  + 1 )  )*mni|i ,n<v< V|(e(x)), 
provided  the  first  min  v, duels  less  than  the  second 

These  are  the  left-sided  bounds:  we  can  also  get  analogous  expressions  for  bounds  using  points  on  the  right  of 


a  distribution.  Unfortunately,  wc  cannot  find  optimal  values  of  D  for  these  formulas  because  they  the 
derivative  cannot  be  applied. 

Note  that  while  it  may  be  difficult  to  determine  for  an  arbitrary  e(x)  whether  die  maximum  in  one  interval 
is  greater  than  in  another,  die  Taylor -scries  quadratic  approximation  often  always  has  this  property  for  either 
the  left-side  or  right-side  rule. 

10.6.  Evaluation  of  quasi-order  statistics  from  the  standard  deviation 

Let  us  return  to  the  analysis  in  section  5.3  of  our  standard  example  with  the  quadratic  Taylor  series 
approximation  at  /i.  Choose  as  subintervals  10<x<33  and  33<x<100,  so  0  =  33-23=  10=a.  Since  the 
error  curve  is  monotonically  increasing  (e(m)<c(/u)<e(M),  and  no  e'(x)  =  0  except  fi)  the  maxima  on  the 
subintervals  arc  at  the  rightmost  points,  and  the  minima  at  die  leftmost.  Hence  die  maxima  are 
e(33)  =  3.50-(3.l4  +  .435-. 106)=  .03  and  e(  100)  =  3.7.  Similarly  for  the  other  bound,  choose  0=5,  10<x<18, 
and  18<x<100;  and  the  minima  are  e(10)  =  -.12  and  e( IX)  —  2.89  (3. 14-.2 1 7-.023)  =  -.01.  The  maximum 
fraction  f  for  x  —  33  is  102/(102  +  102)  =  .5,  and  for  x  =  18  is  102/(  I02  +  52)  =  .8.  Hence  the  revised  bounds  on 
the  mean  of  the  transformed  values  arc 

lower:  3.06  -  ,5*,03  -  .5*3.7  =  1.20 
upper:  3.06  -  ,8*-.12  -  .2*-.01  =  3.16 

which  arc  better  than  die  bounds  obtained  in  section  5.3. 

D  is  a  parameter  here  dial  can  vary  arbitrarily.  Let  us  find  the  best  value  for  it,  for  die  case  of  a  Taylor 
scries  approximation  where  c(x)  increases  with  x,  and  a  lower  bound: 

0  =  0/0D  [(o2/(ff2  +  I)2))*c(/i-l))  +  (l)?/(ff“  + 1)2))  *  e(M)| 

0  =  0/01)  Ha2  *  c(fi-l))  +  I)2  *  e(M))  /  (a2  + 1)2)) 
so  a20/0!)|IT|i-l))-ftft)-l)l'(/i)-.5l)  3  [2l)*c(M)l 

=  [a2[f(/t-l))  -  ft/i)-  Dl'(/i)-  ,5l)2f"(/i)|  -t  I )2e( M)]  *  21) 

Hence  CT2[-f(/i-l))-l»-l)f"(jii)|  +  |2l)*e(M)l 
=  (02[fl[ja-l))-ft/i)-l)lV)  -  51)2r(M)|  t  I)  c(M)]  *  2D 
or2l)c(mKl-l)2)/  a2 

=  fVU)  +  (l-2l)2)ffy)  +  I)(l-l)2)f"(/i))  -  2l)ff/t-D)  -  2I)ft/i) 
which  wc  can  solve  by  iterative  methods  to  find  the  best  value  of  D. 

10.7.  Splines  and  order  statistics 

Wc  have  not  referred  to  spline  approximations  in  die  preceding  analysis  because  if  an  approximation  curve 
is  divided  into  pieces  with  different  properties  then  we  must  know  how  many  data  points  arc  in  each  to 
calculate  means  and  standard  deviations  on  die  transformed  values.  One  might  think  that  for  a  given  set  of 
order  statistics  on  a  distribution  wc  may  be  able  to  create  a  spline  approximation  broken  at  die  points  at  which 


the  order  statistics  are  sited,  and  use  that  for  hounding.  But  we  still  need  to  know  means  of  every  subintervnl, 
the  knowledge  discussed  in  section  9,  which  may  be  difficult  to  obtain.  1  bus  splines  may  be  difficult  to  use. 

1 1 .  Using  fits  to  known  distributions 

As  a  final  kind  of  information  which  we  might  have  about  a  set  of  values,  we  might  know  that  their 
distribution  is  close  to  some  well-known  distribution,  with  a  certain  allowed  tolerance.  If  the  tolerance  is 
small  we  can  expect  quite  tight  bounds  on  the  transformed  values.  But  estimating  statistics  this  way  requires 
special  preparation  in  advance  (namely,  measuring  fits  to  a  predicted  distribution),  and  is  not  possible  with 
most  data  presented  in  already-aggregated  units. 

1 1.1.  General  formula  for  known  distributions 

A  well-known  result  (e.g.  [3],  section  7.3)  gives  the  distribution  of  the  transform  of  some  probability 
distribution  p(x),  under  the  transformation  function  ffx),  as 
q(y)  =  p(f '(>•))  *  |df'(y)/dy| 

as  a  function  of  y,  provided  f  is  either  monotonicaily  increasing  of  decreasing  in  the  interval. 

So  for  instance  if  our  p(x)  approximates  a  uniform  distribution  on  the  interval  in  to  M,  q(y)  =  (l/(M-m))  * 
|df  *(y)/dy|.  For  Rx)  =  ln(x).  q(y)  =  cV(M-m)  on  the  interval  y  =  ln(m)  to  y  -  ln(M):  an  estimate  of  the  mean 
ofq(y)  is 

Jyq(y)dy  /  Jq(y)dy  =  |(ln(M)-l)M  -  (In(mH)m]  /  (M-m)  -  1  t  [M  !n(M)  -  m  l»(m)|/(M-m) 

and  an  estimate  of  the  second  moment  about  zero  is 

/y?q(y)dy  /  /q(y)dy  =  |M[ln(M)*ln(M)  -  2  ln(M)  +  2|  - 
m[ln(m)*ln(m)  -  2  ln(m)  +  2] J  /  (M -in) 

which  minus  the  square  of  die  estimate  of  the  mean  gives  an  estimate  of  the  variance. 

For  p(x)  uniform,  ffx)=l/x,  q(y)  =  l/y?(M  m)  on  the  interval  y=l/M  to  y=l/m;  an  estimate  of  the 
mean  of  q(y)  is 

[ln(l/m)-ln(l/M))  /  (M-m)  =  ln(M/tn)/(M-m) 

and  an  estimate  of  the  second  moment  about  zero  is  (I/m  -  1/M)/(\1  m)  -  1/niM.  hence  an  estimate  of  the 
variance  is 

1/mM  -  (ln(M/m)/(M-m)]2 
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1 1 .2.  Handling  inexact  fits  to  distributions 

We  have  not  addressed  how  to  get  bounds  on  means  and  standard  deviations.  We  can  do  this  by  defining 
an  "upper  fit"  co(  and  "lower  lit"  cof  on  die  discrete  set  of  n  values  x(  such  that 
to, .  =  max  [x  -  g  I,  to.  =  min  fx  -el 

L  i 1  i  ci  l  i  1  i  °il 

where  p(x)dx  =  (i-.5)/n,  and  p(x)  is  the  distribution  the  xj  fit  to 

In  other  words,  the  tits  are  the  maximum  and  minimum  deviations  of  an  x(  from  its  value  predicted  by  die 
approximating  distribution  p(x). 

We  can  exploit  die  assumed  fact  that  t'(x)  is  monotomcully  increasing  or  decreasing  to  say  that  the 
maximum  and  minimum  of  the  mean  of  the  u.uisformed  values  occur  when  the  x]  are  all  at  or  all  at 
from  their  predicted  positions,  not  necessarily  respectively.  This  is  because  less  than  an  extreme  deviation  for 
one  point  cannot  improve  prospects  for  a  more  extreme  mean;  all  point  deviations  arc  independent  of  one 
another,  within  the  tolerances.  Hence  to  find  the  extreme  values  of  the  transformed  mean  one  just  calculates 
the  means  of 

qt  <y)  =  p[f'(y)-w,  |  *  |df  '(v)/dvj  and 
q |  < > )  =  p|f '(yl-Wj  I  *  |df  '(> )/d> | 

Wc  can  use  this  same  approach  to  get  hounds  on  the  standard  deviation  in  the  manner  of  section  4.1.  We 
just  define  a  g(x)  =  [t(x)|?  as  a  new  transformation  function,  and  compute  the  above  formulae  with  g  instead 
of  f.  We  then  compute  bounds  on  the  mean,  square  them,  and  subtract  this  interval  from  the  interval 
computed  on  the  mean  ofg(x). 

1 1 .3.  Example  of  inexact  distribution  fit 

Suppose  we  know  the  distribution  of  x  fits  an  even  distribution  on  the  interval  10  to  100,  to  such  an  extent 
that  a  point  is  never  further  than  2  units  in  advance  of  where  it  would  be  in  a  perfectly  even  distribution,  and 
never  more  than  3  units  behind.  I  lien  the  maximum-mean  distribution  is  a  uniform  distribution  from  12  to 
102,  and  the  minimum-mean  distribution  is  a  uniform  distribution  from  7  to  07.  Suppose  we  want  to  find  the 
mean  of  the  logarithms  of  these  data  v/ucs.  ( 'sing  the  formulae  w  e  obtained  in  section  11.1,  the  mean  of  the 
first  distribution  is  [102  ln(!02)  -  12  ln(12)  -  102  +  12]/ (102-12)  -  (472  -  29.8J/90  -  1  =  5,02  -  1  =  4.02;  and 
the  mean  of  the  second  distribution  is  [97  ln(‘)7)  -  7  ln(7)  -  97  +  7|  /  (97-7)  =  (443  -  13.6)/90  -  l  =  4.78  -  1  = 
3.78.  Hence  the  mean  of  the  transformed  values  is  between  3.78  and  4.02,  corresponding  to  antilogs  of  44  and 
5b.  Note  the  mean  of  the  original  values  must  lie  between  (102  f  12)/2  —  57  and(97  +  7)/2  =  52. 

lor  an  estimate  of  the  standard  deviation  we  use  the  formula  previously  derived  for  an  estimate  of  the  sum 
of  the  squares,  namely 


f 
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[M[ln(M)*ln(M)  -  2  ln(\1)  +  2|  -  m[ln(m)*ln(m)  -  2  ln(m)  -F  2]|  /  (M-m) 

=  (M(ln(M)-l)2  -  m(l  n(m)-l )  |/(M-m)  +  1 

[■'or  the  uniform  distribution  12  to  102,  this  is 

[102(3.62)'  -  1 2(  1 .48)2l/90  +  1  =  (1338-26.2)/90  +  1  =  15.61 

and  for  die  uniform  distribution  7  to  97  this  is 

[97(3. 57)2  -  7(.945)2]/90  +  1  =  (1235-6.25)/90  +  1  =  14.58 

From  the  previous  paragraph  we  know  bounds  on  the  mean  of  the  transformed  values  are  3.78  and  4.02, 

hence  bounds  on  the  square  of  the  mean  arc  14.3  and  16.2.  Hence  bounds  on  the  variance  are  15.6 1- 14.3  =  1 .3 

and  max(  14,58- 16.2.0)  =  0.  Hence  bounds  on  the  standard  deviation  of  the  transformed  values  are  1.14  and 

0. 

12.  Small  populations 

Thusfar  we  have  not  made  use  of  the  si/e  of  the  data  population  being  analyzed.  I  his  is  only  significant  if 
the  population  is  particularly  small,  in  which  case  tire  known  maximum  M  and  minimum  m  (and  the  median 
and  mode  too.  if  known)  are  a  nonnegligible  proportion  of  the  points  of  the  distribution.  For  instance,  the 
linear  bounds  represent  in  general  the  two  extreme  cases  w  here  (a)  all  the  points  arc  grouped  at  the  mean,  and 
(b)  all  the  points  are  at  the  maximum  and  the  minimum.  Knowledge  of  M  and  m  thus  decreases  the  distance 
between  linear  bounds  by  a  factor  of  2/n,  n  the  si/c  of  tire  data  population,  since  it  represents  a  weighted 
modification  of  case  (a)  by  two  points  from  case  (b). 

13.  Some  experimental  comparisons  of  the  various  bounds  formulae 

We  have  run  some  simple  experiments  of  the  effectiveness  of  our  bounds  formulae 'm  the  mean  of  the 
transformed  values.  We  wrote  programs  in  INTF.kl  ISI’-VAX.  Wc  used  two  test  functions,  f)x)  =  ln(x)  and 
flx)=  1/x.  F’or  the  experiments  we  computed  upper  and  lower  bounds  derived  the  follow  ing  ways: 

•  simple  linear  bounds  (section  3) 

•  Taylor-series  quadratic  bounds,  series  around  the  mean  (section  5) 

•  I  ngrangc-Chcbyshcv  interpolation  quadratic  bounds  (section  6) 

•  For  the  reciprocal  only,  the  one-sided  quadratic  bounds  (section  7) 

•  Order-statistic  bounds  from  the  Chebyshcv -inequality,  using  a  Taylor  series  around  the  mean 
(section  10.5) 

•  I  lest  quadratic  bounds  found  by  explicit  optimization  on  quadratic  coefficients  a  and  b  (section  8): 

uppei  bound:  a(rr  )  t-  bp  r-  c  l  max  ^ ^lUxl-ax’  bx-cj 
lowei  bound:  u(<j‘  +  p  )  4  bp  r  c  4-  min  <x<^jl^x)-ax' -bx-cj 


14 


Wc  discovered  that  our  results  for  optimal  hounds  for  the  reciprocal  curve  weic  identical  (except 
for  roundoff  error)  to  those  for  one-sided  hounds,  so  we  have  omitted  the  former  from  the 
reciprocal  table.  Unfortunately,  we  have  been  unable  to  prove  the  connection  (that  is,  that  the 
one-sided  bounds  are  indeed  the  optimal  ones),  though  we  strongly  suspect  it. 

Results  arc  contained  in  figures  1.1-1  and  1.1-2.  Since  the  closed-form  expressions  arc  simple  computations,  in 
a  computer  implementation  it  is  advisable  to  try  all  the  different  bounds  methods,  and  take  the  minimum  of 
the  upper  bounds  to  get  a  cumulative  upper  bound,  and  the  maximum  of  the  lower  bounds  to  get  a 
cumulative  lower  bound. 

14.  Application  to  correlated  data 

An  application  of  these  ideas  is  to  estimation  of  statistics  of  one  attribute  from  those  of  another  if  the 
attributes  are  known  to  have  a  nonlinear  correlation  describable  by  a  monotonic  function  such  as  wc  have 
been  analyzing.  We  can  then  bound  statistics  on  one  attribute  from  statistics  on  the  other. 

15.  Direct  optimization 

We  should  note  there  is  another  kind  of  optimization  ili.it  can  be  applied  to  problems  of  this  sort.  We  can 
make  the  optimization  variables  the  values  themselves  of  an  unknown  distribution  and  perforin  a  constrained 
optimization  with  objective  function  the  statistic  on  which  bounds  are  desired,  and  with  constraints  the  values 
of  known  other  statistics.  Conceptually ,  tins  is  a  nice  approach  since  it  can  be  applied  to  arbitrary  states  of 
prior  knowledge  andean  bound  arbitrary  statistics. 

Wc  have  done  a  number  of  experiments  w  hich  we  do  not  have  the  space  here  to  discuss,  and  the  idea  seems 
to  work.  However,  we  have  found  that  this  "direct  optimization"  is  highly  sensitive  to  optimization  methods, 
starting  points,  and  step  sizes,  and  is  surprisingly  difficult  to  get  convergence  for:  unlike  quadratic 
optimization,  the  function  optimized  is  not  usually  convex.  But  there  is  an  even  more  serious  problem  with 
direct  optimization,  a  very  fundamental  one:  it  only  gives  lower  bounds  on  upper  bounds,  and  upper  bounds 
on  lower  bounds,  unlike  all  the  oilier  bounds  diseased  in  this  paper  which  are  upper  hounds  on  upper 
bounds,  and  lower  bounds  on  lower  hounds,  l  or  instance,  for  our  standard  example  we  found  a  lower  hound 
on  the  upper  bound  of  3.00771  on  the  mean  of  the  logarithms  from  direct  optimization,  but  we  have  no  idea 
how  much  larger  a  bound  is  possible  up  to  the  quadratic-optimization  bound  of  .1. 10.1S.1  which  represents  an 
absolute  limit.  Thus  the  utility  of  direct  optimization  is  questionable  in  boptmded  statistical  estimation,,  and 
we  do  not  sec  it  as  a  challenge  to  the  methods  developed  in  this  paper.  (It  docs  provide  a  useful  tool  for 
debugging  the  methods,  however,  since  for  instance  any  supposed  hound  we  find  less  than  the  upper  bound 
on  the  lower  hound  is  in  error.) 


16.  Conclusion 

We  have  developed  some  quick  dosed -form  expressions  loi  In  minis  on  the  mem  ,uut  standard  deviation  of 
a  finite  set  of  transformed  numerical  data  values,  where  the  transformation  fum  lion  has  derivatives  of 
constant  sign  in  the  interval  of  interest.  In  making  these  estimates  we  use  only  st.itistics  on  the  original  set  of 
data  values,  and  no  actual  values  themselves.  Our  hounds  provide  a  useful  alternative  to  often  dilTicult-to- 
oblain  confidence  intervals,  requiting  no  distributional  assumptions  whatsoever.  Such  hounds  arc  likely  lobe 
helpful  for  exploratory  data  analysis  as  an  aid  to  getting  a  feel  for  the  data,  preliminary  to  detailed  hypothesis 
testing. 
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